Riemannian metrics for neural networks I: feedforward networks

نویسنده

  • Yann Ollivier
چکیده

We describe four algorithms for neural network training, each adapted to different scalability constraints. These algorithms are mathematically principled and invariant under a number of transformations in data and network representation, from which performance is thus independent. These algorithms are obtained from the setting of differential geometry, and are based on either the natural gradient using the Fisher information matrix, or on Hessian methods, scaled down in a specific way to allow for scalability while keeping some of their key mathematical properties. The most standard way to train neural networks, backpropagation, has several known shortcomings. Convergence can be quite slow. Backpropagation is sensitive to data representation: for instance, even such a simple operation as exchanging 0’s and 1’s on the input layer will affect performance (Figure 1), because this amounts to changing the parameters (weights and biases) in a non-trivial way, resulting in different gradient directions in parameter space, and better performance with 1’s than with 0’s. (In the related context of restricted Boltzmann machines, the standard training technique by gradient ascent favors setting hidden units to 1, for similar reasons [OAAH11, Section 5].) This specific phenomenon disappears if, instead of the logistic function, the hyperbolic tangent is used as the activation function, or if the input is normalized. But this will not help if, for instance, the activities of internal units in a multilayer network are not centered on average. Scaling also has an effect on performance: for instance, a common recommendation [LBOM96] is to use 1.7159 tanh(2x/3) instead of just tanh(x) as the activation function. It would be interesting to have algorithms whose performance is insensitive to particular choices such as scaling factors in network construction, parameter encoding or data representation. We call an algorithm invariant, or intrinsic, if applying a change of variables to the parameters and activities results in the same learning trajectory. This is not the case for backpropagation (even after changing the learning rate): for instance, changing from sigmoid to tanh activation amounts to dividing the connection weights by 4

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Principles of Riemannian Geometry in Neural Networks

This study deals with neural networks in the sense of geometric transformations acting on the coordinate representation of the underlying data manifold which the data is sampled from. It forms part of an attempt to construct a formalized general theory of neural networks in the setting of Riemannian geometry. From this perspective, the following theoretical results are developed and proven for ...

متن کامل

Multi-View Face Detection in Open Environments using Gabor Features and Neural Networks

Multi-view face detection in open environments is a challenging task, due to the wide variations in illumination, face appearances and occlusion. In this paper, a robust method for multi-view face detection in open environments, using a combination of Gabor features and neural networks, is presented. Firstly, the effect of changing the Gabor filter parameters (orientation, frequency, standard d...

متن کامل

Benchmarking Feed-Forward Neural Networks: Models and Measures

Existing metrics for the learning performance of feed-forward neural networks do not provide a satisfactory basis for comparison because the choice of the training epoch limit can determine the results of the comparison. I propose new metrics which have the desirable property of being independent of the training epoch limit. The efficiency measures the yield of correct networks in proportion to...

متن کامل

بررسی کارایی روش‌های مختلف هوش مصنوعی و روش آماری در برآورد میزان رواناب (مطالعه موردی: حوزه شهید نوری کاخک گناباد)

Rainfall-runoff models are used in the field of hydrology and runoff estimation for many years, but despite existing numerous models, the regular release of new models shows that there is still not a model that can provide sophisticated estimations with high accuracy and performance. In order to achieve the best results, modeling and identification of factors affecting the output of the model i...

متن کامل

Numerical solution of fuzzy linear Fredholm integro-differential equation by \fuzzy neural network

In this paper, a novel hybrid method based on learning algorithmof fuzzy neural network and Newton-Cotesmethods with positive coefficient for the solution of linear Fredholm integro-differential equation of the second kindwith fuzzy initial value is presented. Here neural network isconsidered as a part of large field called neural computing orsoft computing. We propose alearning algorithm from ...

متن کامل

Forecasting Sunspot Numbers with Neural Networks

This paper presents a feedforward neural network approach to sunspot forecasting. The sunspot series were analyzed with feedforward neural networks, formalized based on statistical models. The statistical models were used as comparison models along with recurrent neural networks. The feedforward networks had 24 inputs (depending on the number of predictor variables), one hidden layer with 20 ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013